A Topical Crawler for Uncovering Hidden Communities of Extremist Micro-Bloggers on Tumblr
نویسندگان
چکیده
Research shows that microblogging websites such as Tumblr are being misused as a platform to disseminate hate and extremism. We formulate the problem of locating such extremist communities as a graph search problem. We propose a topical crawler based approach performing several tasks: searching for a blogger, computing its similarity against exemplary documents, filtering hate promoting bloggers, navigating through links to other bloggers and managing a queue of such bloggers for social network analysis. We conduct experiments on real world dataset and examine the e↵ectiveness of ’like’ and ’reblog’ features as links between bloggers. Experimental results demonstrates that the proposed solution approach is e↵ective with an F-score of 0.80.
منابع مشابه
Spider and the Flies : Focused Crawling on Tumblr to Detect Hate Promoting Communities
Tumblr is one of the largest and most popular microblogging website on the Internet. Studies shows that due to high reachability among viewers, low publication barriers and social networking connectivity, microblogging websites are being misused as a platform to post hateful speech and recruiting new members by existing extremist groups. Manual identification of such posts and communities is ov...
متن کاملApplying Social Media Intelligence for Predicting and Identifying On-line Radicalization and Civil Unrest Oriented Threats
Research shows that various social media platforms on Internet such as Twitter, Tumblr (micro-blogging websites), Facebook (a popular social networking website), YouTube (largest video sharing and hosting website), Blogs and discussion forums are being misused by extremist groups for spreading their beliefs and ideologies, promoting radicalization, recruiting members and creating online virtual...
متن کاملPrioritize the ordering of URL queue in Focused crawler
The enormous growth of the World Wide Web in recent years has made it necessary to perform resource discovery efficiently. For a crawler it is not an simple task to download the domain specific web pages. This unfocused approach often shows undesired results. Therefore, several new ideas have been proposed, among them a key technique is focused crawling which is able to crawl particular topical...
متن کاملA multi-region empirical study on the internet presence of global extremist organizations
Extremist organizations are heavily utilizing Internet technologies to increase their abilities to influence the world. Studying those global extremist organizations’ Internet presence would allow us to better understand extremist organizations’ technical sophistication and their propaganda plans. In this work, we explore an integrated approach for collecting and analyzing extremist Internet pr...
متن کاملRules Design in Word Segmentation of Chinese Micro-Blog
This paper proposed a Hidden Markov Model (HMM) based tokenizer for Chinese micro-blog texts. Comparing with normal Chinese texts, micro-blog texts contain more uncertainties. These uncertainties are generally aroused by the irregular use of bloggers (such as network words, dialect words, wrong written characters, mixture of foreign words and symbols, etc.). Besides the lack of the annotated tr...
متن کامل